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EXECUTIVE  SUMMARY 


This  report  discusses  the  evaluation  of  a  commercially  available  speech  recognition  system 
on  the  NATO  Native  and  Non-Native  (N4)  database.  Using  the  statistical  language  modeling 
techniques,  trigram  language  models  were  generated  for  each  of  three  countries  in  the  database, 
CA,  NL,  and  UK.  Due  to  time  constraints,  the  DE  data  was  not  evaluated.  For  each  of  the 
countries,  two  factors  were  assessed.  The  first  was  overall  word  accuracy  and  the  second  was 
callsign  accuracy.  For  this  evaluation,  only  standard  American  English  acoustic  models  were 
used.  Results  of  each  country  evaluation  are  discussed. 
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INTRODUCTION 


Commercially  available  speech  recognition  systems  are  finally  reaching  a  level  of  maturity  to 
be  considered  for  various  military  applications  [1]  [2]  [3]  [4].  These  applications  range  from 
ground-based  command  and  control  operations  in  an  air  operations  center  to  tactical  command 
and  control  in  a  high  performance  fighter  aircraft.  Another  application  that  is  of  interest  to  the 
military  is  in  the  area  of  training.  The  use  of  speech  recognition  technology  to  act  as  synthetic 
players  in  training  exercises  promises  to  greatly  reduce  the  manpower  required  to  train  personnel 
for  various  tasks,  such  as  air  traffic  control,  AW  ACS  operations,  and  other  communications 
tasks.  A  significant  challenge  for  speech  technology  is  to  have  it  act  as  a  performance 
assessment  tool  to  automatically  grade  a  student  on  their  ability  to  correctly  perform  a  given 
communications  task.  An  additional  challenge  is  if  the  student  is  trying  to  perform  the 
communications  task  in  non-native  English.  To  see  if  commercial-off-the-shelf  technology  is  up 
to  this  challenge,  an  evaluation  was  performed  on  the  NATO  Native  and  Non-Native  (N4) 
database  [5]  consisting  of  students  conducting  naval  communications  training  sessions  from  four 
different  countries,  Canada  (CA),  United  Kingdom  (UK),  Netherlands  (NL),  and  Germany  (DE). 
Of  particular  interest  was  to  see  how  well  the  COTS  system  would  be  able  to  recognize  not  only 
the  individual  words,  but  also  how  well  it  could  recognize  and  identify  the  various  callsigns 
spoken  during  the  training  sessions.  This  report  discusses  the  development  of  the  language 
models  and  the  resulting  word  and  callsign  accuracy  obtained  from  three  of  the  countries 
represented  in  the  database,  CA,  UK,  and  NL.  Due  to  time  constraints,  the  DE  data  was  not 
evaluated. 


PROCEDURE 


Language  Model  &  Callsign  Interpretation  Grammar  Development 

A  separate  statistical  language  model  (SLM)  was  developed  for  each  of  the  three  countries. 
For  each  model,  the  transcripts  were  modified  to  replace  specific  callsign  references  with  a 
generic  Callsign  grammar  placeholder.  A  trigram  SLM  was  generated  from  the  modified 
training  data.  A  unique  callsign  interpretation  grammar  was  developed  for  each  country  based 
on  an  analysis  of  the  format  and  frequency  of  occurance  of  callsigns.  In  addition  to  creating 
callsign  grammars,  several  other  grammars  were  developed  to  improve  callsign  detection 
accuracy.  These  included  grammars  for  authentication  codes  and  zulu  time.  The  specific 
interpretation  grammars  for  each  country  are  outlined  in  Appendices  A-C.  Note  that  the  nodes 
with  a  dotted  line  are  optional  nodes.  For  all  three  countries  tested,  the  standard  American 
English  acoustic  models  provided  with  the  system  were  used. 

Data  Preparation 

Prior  to  the  evaluation,  several  steps  were  necessary  to  prepare  the  source  material.  First, 
individual  wav  files  were  generated  based  on  the  transcription  data  provided.  Next,  each  wav 
file  was  downsampled  to  8JCHz  to  match  the  requirements  of  the  COTS  system’s  acoustic  model. 
Recognition  testing  was  performed  on  each  data  set  with  several  default  parameters  modified 
based  on  prior  experience  with  this  system  on  similar  speech  data.  These  parameters  included 
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enabling  a  noise  filtering  process  to  improve  the  signal,  reducing  the  rejection  threshold  to 
reduce  rejection  errors,  and  increasing  the  pruning  value  to  improve  accuracy.  All  recognition 
data  was  captured  in  log  files  for  subsequent  analysis. 

RESULTS 

The  results  for  each  country  evaluation  were  parsed  into  two  separate  data  sets.  The  first  set 
contained  the  raw  recognition  text  result  returned  by  the  system.  The  second  set  contained  only 
a  list  of  callsigns  detected  by  the  callsign  interpretation  grammars.  These  data  sets  were  then 
formatted  into  spu_id  input  files  for  analysis  by  sclite,  a  NIST  developed  scoring  program 
commonly  used  to  score  recognition  testing. 

Raw  Text  Transcription  Results 

The  first  metric  of  interest  was  how  well  the  COTS  system  performed  on  the  raw 
transcription  task.  The  results  for  all  three  countries  are  presented  in  Table  1. 


Performance  Metric 

CA 

NL 

UK  1 

Sentence  Recognition  Performance 

count 

(%) 

count 

(%) 

count 

{%) 

Total  Sentences 

809 

327 

Total  Errors 

767 

94.8% 

273 

70.7% 

Substitutions 

457 

56.5% 

KfumiQ 

Deletions 

612 

75.6% 

insertions 

255 

31.5% 

110 

58 

Word  Recognition  Performance 

Total  Words 

11555 

4520 

Total  Errors 

3434 

29.7% 

24.6% 

22.1% 

Substitutions 

766 

BBSS 

10.5% 

Deletions 

172 

9.5% 

Insertions 

175 

3.9% 

2.1% 

Correct 

8535 

3582 

80.0% 

Word  Accuracy 

70.3% 

77.9% 

Table  1.  Sentence  and  Word  Error  Rates  for  Transcription  Task. 


Callsign  Detection  Results 


The  second  item  of  interest  was  how  well  the  system  could  recognize  and  label  callsign  data 
within  a  given  utterance.  For  purposes  of  scoring,  each  callsign  was  considered  a  single  token  or 
word.  Also,  a  sentence  was  simply  a  sequence  of  callsigns  detected  in  the  original  utterance. 
The  results  for  all  three  countries  are  presented  in  Table  2. 


Performance  Metric 

CA 

NL 

UK 

Sentence  Recognition  Performance 

count 

(%) 

count 

{%) 

count 

(%) 

Total  Sentences 

809 

321 

324 

Total  Errors 

485 

60.0% 

246 

76.6% 

181 

55.9% 

Substitutions 

330 

40.8% 

218 

67.9% 

154 

47.5% 

Deletions 

73 

9.0% 

17 

5.3% 

12 

3.7% 

Insertions 

222 

27.4% 

101 

31.5% 

39 

12.0% 

Word  Recognition  Performance 

Total  Words 

1217 

554 

519 

Total  Errors 

802 

65.9% 

438 

79.1% 

248 

47.8% 

Substitutions 

381 

31.3% 

295 

53.2% 

173 

33.3% 

Deletions 

106 

8.7% 

20 

3.6% 

27 

5.2% 

Insertions 

315 

25.9% 

123 

22.2% 

48 

9.2% 

Correct 

730 

60.0% 

239 

43.1% 

319 

61.5% 

Word  (Callsign)  Accuracy 

34.1% 

20.9% 

52.2% 

Table  2.  Callsign  Detection  Results. 


DISCUSSION 

This  database  represented  a  significant  challenge  for  evaluation.  Not  only  was  there  a 
significant  amount  of  disfluent  speech  present,  but  the  addition  of  non-native  English  speakers 
proved  very  difficult  for  the  COTS  system.  To  be  fair,  the  system’s  American  English  acoustic 
models  were  not  very  representative  of  much  of  the  database.  Also,  very  little  fine  tuning  of 
pronunciation  dictionaries  was  performed  due  to  time  constraints  in  the  evaluation.  This  was  a 
particular  problem  in  the  NL  evaluation  with  many  Dutch  words  interspersed  among  the  English 
words.  Additional  performance  benefits  could  be  obtained  if  some  adaptation  was  performed  on 
the  standard  acoustic  models  and  if  dictionaries  were  tuned. 

Another  problem  encountered  in  the  evaluation  was  the  length  of  several  of  the  test 
utterances.  The  COTS  system  tested  only  accepts  utterances  under  30  seconds  in  duration. 
Many  of  the  utterances  exceeded  this  length.  Appendix  D  shows  the  list  of  utterances  for  each 
country  that  could  not  be  evaluated.  Additional  effort  could  be  expended  in  splitting  the 
utterances  into  smaller  segments  and  then  evaluating  these  segments  against  the  COTS  system. 
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APPENDIX  A: 

CALLSIGN  GRAMMARS  FOR  CA  DATA 


Callsign 
(allowed  up  to  4  times) 
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APPENDIX  B; 

CALLSIGN  GRAMMARS  FOR  NL  DATA 


Source_CS  — ^  this  ~~|>i 


Dest  SourceCS 


Ending_CS 


Authenticate 


L 


m 


Callsign 


Callsign 


Callsign 


i  ah  I 


Callsign 


(alfa  -  Zulu) 
(alfa  -  Zulu) 
(alfa  -  Zulu) 


(alfa  -  Zulu) 


(alfa  -  Zulu) 
(zero  -  nine) 
(zero  -  nine) 


(alfa  -  Zulu) 
(alfa  -  Zulu) 
(zero  -  nine) 


(alfa  -  Zulu) 


(zero  -  nine) 
(zero -nine) 
(alfa  -  Zulu) 


(alfa  -  Zulu) 
(zero  -  nine) 


(zero  -  nine) 


(zero  -  nine) 


(alfa  -  Zulu) 


(alfa  -  Zulu) 


(alfa  -  Zulu) 


authenticate 

for 

authentication 

R  'U 

Auth  code 


(alfa  -  Zulu) 

Auth_code  — ►  (alfa  -  zulu)  (alfa  -  zulu) 

(alfa  -  Zulu)  (alfa -Zulu)  LJ  (alfa -zulu) 


■nme  — ►!  time 


(0-9)  W  (0-9)  W  (0-9)  M  (0-9)  W  Zulu 


APPENDIX  C: 

CALLSIGN  GRAMMARS  FOR  UK  DATA 


Source_CS 


Dest_Source_CS 


Ending_CS 


i 

this 

■H  Ii  J* 

Cailsign 

k 

over  i 

1 

roger  | 

Callsign 


this 


IS 


Callsign 


Callsign 


1  wilco 

\  over 

roger 


L...: 


over 


Mid_CS — N  callsign 


^ - .A 

\  rflllQinn  f 


(ailovired  up  to  3  times) 

Callsign 
(aliowed  up  to  6  times) 

Zulu_zulu 


...N^ 


(zero  -  nine) 

k 

(alfa  -  Zulu) 

k 

(alfa  -  Zulu) 

(alfa  -  zuiu) 

w 

(zero  -  nine) 

w 

(alfa  -  Zulu) 

(zero  -  nine) 

w 

(alfa  -  Zulu) 

k 

(alfa  -  Zulu) 

(alfa  -  Zulu) 

k 

(zero  -  nine) 

k 

(alfa  -  Zulu) 

(0-9)  ->  (0-9) 

W  (0-9) 

(0-9)  ->  Zulu 

Zulu 
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APPENDIX  D: 

UTTERANCES  NOT  EVALUATED 


CA 

CA001-01-7  CA005-11-52  NL001-07-56 

CA001-01-19  CA005-12-54  NL001-10-59 

CA001-01-21  CA005-12-56  NL001-08-61 

CA001 -03-23  CA006-14-20  NL001 -05-62 

CA001 -03-26  CA006-15-22  NL001 -05-63 

CA001-01-28  CA006-16-35  NL001-01-75 

CA001 -03-38  CA006-08-47  NL002-04-1 

CA001-03-41  CA007-14-1  NL002-09-4 

CA001 -06-43  CA007-U-18  NL002-09-5 

CA001 -03-48  CA007-17-25  NL002-02-10 

CA001 -03-54  CA007-07-29  NL002-09-13 

CA001 -02-55  CA007-07-38  NL002-03-15 

CA002-05-1  CA007-08-42  NL002-06-19 

CA002-03-8  CA007-08-44  NL002-10-22 

CA002-05-35  CA007-09-48  NL003-07-1 

CA003-05-7  CA008-09-15  NL003-09-6 

CA003-02-12  CA008-08-18  NL003-09-10 

CA003-02-16  CA008-08-23  NL003-05-14 

CA003-02-19  CA008-13-26  NL003-02-23 

CA003-02-21  CA008-07-30  NL003-02-27 

CA003-02-22  CA008-07-32  NL003-11-31 

CA003-03-24  CA009-07-32  NL003-08-32 

CA003-04-26  CA009-12-38  NL003-01-35 

CA003-03-28  CA009-16-40  NL003-03-43 

CA003-02-35  CA009-14-46  NL003-03-45 

CA003-04-38  CA009-12-52  NL003-08-51 

CA003-02-40  CA009-07-79  NL003-10-53 

CA003-03-41  CA010-14-18  NL004-11-2 

CA003-04-44  CA01 0-1 0-45  NL004-09-4 

CA003-03-60  CA010-09-51  NL004-05-6 

CA003-05-63  CA01 0-08-55  NL004-02-11 

CA004-05-1  CA01 0-07-57  NL004-04-12 

CA004-02-7  CA01 0-07-63  NL004-XX-14 

CA004-02-9  CA01 1-21-1  NL004-08-25 

CA004-01-13  CA011-21-6  NL004-04-28 

CA004-03-17  CA011-21-23  NL004-10-31 

CA004-01-22  CA011-21-36  NL004-07-33 

CA004-01-24  CA011-21-81  NL005-19-11 

CA004-01-39  CA011-21-97  NL005-21-15 

CA004-02-47  CA01 1-1 9-99  NL006-13-1 

CA004-03-49  CA01 1-21 -100  NL006-18-3 

CA005-U-2  CA011-21-142  NL006-17-14 

CA005-08-4  CA01 1-21 -149  NL006-14-16 

CA005-10-10  CA011-U-156 

CA005-U-12  CA011-U-169 

CA005-15-19  CA011-21-191 

CA005-16-21  CA01 1-21 -193 


NL  UK 

NL007-22-1  UK001-01-1  UK003-02-103 

NL007-23-9  UK001-01-3  UK003-06-108 

NL007-12-20  UK001-01-6  UK003-01-111 

NL008-16-5  UK001-01-8  UK004-06-1 

NL008-18-8  UK001-01-12  UK004-06-4 

NL008-15-10  UK001-04-14  UK004-10-9 

NL008-13-13  UK001-01-16  UK004-06-14 

NL009-22-3  UK001-01-32  UK004-06-18 

NL009-22-5  UK001-01-36  UK004-10-21 

NL009-14-13  UK001-05-44  UK004-06-24 

NL009-19-15  UK001 -02-50  UK004-10-38 

NL009-16-17  UK001 -04-62  UK004-06-41 

NL009-16-20  UK001 -08-85  UK004-12-43 

NL009-21-23  UK001 -03-89  UK004-12-46 

NL009-18-25  UK001 -06-93  UK004-06-48 

NL009-18-26  UK002-06-1 

NL010-18-3  UK002-03-12 

NL01 0-20-6  UK002-08-41 

NL01 0-20-9  UK002-08-44 

NL01 0-23-1 1  UK002-08-50 

NL010-12-12  UK002-08-52 

NL010-22-13  UK002-08-55 

NL010-17-15  UK002-08-59 

NL010-18-17  UK002-08-81 

NL010-17-18  UK003-06-2 

NL01 1-24-1 7  UK003-02-10 

NL01 1-26-20  UK003-12-20 

NL01 1-24-34  UK003-02-39 

NL01 1-27-39  UK003-12-44 

NL01 1-29-41  UK003-07-45 

NL01 1-26-46  UK003-10-49 

NL01 1-25-56  UK003-10-52 

NL01 1-26-57  UK003-12-58 

NL01 2-30-1  UK003-07-60 

NL01 2-24-3  UK003-12-68 

NL01 2-28-9  UK003-12-73 

NL01 3-30-3  UK003-12-75 

NL01 3-24-20  UK003-1 2-83 

NL01 3-30-43  UK003-12-90 

UK003-12-94 
UK003-10-96 
UK003-01-98 
UK003-01-101 


8 


CA01 1-21-230 
CA01 1-21-237 


CA005-U-23 

CA005-07-27 

CA005-U-29 


